Method and apparatus for audio signal processing

ABSTRACT

A sampled digital audio signal is displayed on a spectrogram, in terms of frequency vs. time. An unwanted noise in the signal is visible in the spectrogram and the portion of the signal containing the unwanted noise can be selected using time and frequency constraints. An estimate for the signal within the selected portion is then interpolated on the basis of desired portions of the signal outside the time constraints defining the selected portion. The interpolated estimate can then be used to attenuate or remove the unwanted sound

FIELD OF INVENTION

This invention concerns methods and apparatus for the attenuation orremoval of unwanted sounds from recorded audio signals.

BACKGROUND TO THE INVENTION

The introduction of unwanted sounds is a common problem encountered inaudio recordings. These unwanted sounds may occur acoustically at thetime of the recording, or be introduced by subsequent signal corruption.Examples of acoustic unwanted sounds include the drone of an airconditioning unit, the sound of an object striking or being struck,coughs, and traffic noise. Examples of subsequent signal corruptioninclude electronically induced lighting buzz, clicks caused by lost orcorrupt samples in digital recordings, tape hiss, and the clicks andcrackle endemic to recordings on disc.

Current audio restoration techniques include methods for the attenuationor removal of continuous sounds such as tape hiss and lighting buzz, andmethods for the attenuation or removal of short duration impulsivedisturbances such as record clicks and digital clicks. A detailedexposition of hiss reduction and click removal techniques can be foundin the book ‘Digital Audio Restoration’ by Simon J. Godsill and Peter J.W. Rayner, which in its entirety is incorporated herein by reference.

SUMMARY OF THE INVENTION

The invention in its various aspects provides a method and apparatus asdefined in the appended independent claims.

Preferred or advantageous features of the invention are set out independent sub-claims.

In one aspect, the invention advantageously concerns itself withattenuating or eliminating the class of sounds that are neithercontinuous nor impulsive (i.e. of very short duration, such as 0.1 ms orless), and which current techniques cannot address. They arecharacterised by being localised both in time and infrequency.Preferably, the invention is applicable to attenuating or eliminatingunwanted sounds of duration between 10 s and 1 ms, and particularlypreferably between 2 s and 10 ms, or between 1 s and 100 ms.

Examples of such sounds include coughs, squeaky chairs, car horns, thesounds of page turns, the creaks of a piano pedal, the sounds of anobject striking or being struck, short duration noise bursts (oftenheard on vintage disc recordings), acoustic anomalies caused bydegradation to optical soundtracks, and magnetic tape drop-outs.

In a further aspect, the invention provides a method to performinterpolations that, in addition to being constrained to act upon alimited set of samples (constrained in time), are also constrained toact only upon one or more selected frequency bands, allowing theinterpolated region within the band or bands to be attenuated or removedseamlessly and without adversely affecting the audio content outside ofthe selected band or bands.

Furthermore, standard interpolation techniques do not interpolate thenoise content of the signal well. Methods exist that attempt to overcomethis limitation, but they use a flawed assumption. A preferredembodiment of the invention thus provides an improved method forregenerating the noise content of the interpolated signal, for exampleby means of a template signal as described below. This, combined withthe frequency band constraints, creates a powerful interpolation methodthat extends significantly the class of problems to which interpolationtechniques can be applied.

In the preferred embodiment of the invention, a time/frequencyspectrogram is provided. This is an invaluable aid in selecting the timeconstraints and the frequency bands for the interpolation, for exampleby specifying start and finish times and upper and lower frequencyvalues which define a rectangle surrounding the unwanted sound or noisein the spectrogram. The methods of the invention may also advantageouslyapply to other time and/or frequency constraints, for example usingvariable time and/or frequency constraints which define portions of aspectrogram which are not rectangular.

In a preferred embodiment of the invention, the constrained region doesnot have to contain one simple frequency band; it can comprise severalbands if necessary. In addition, it is not necessary for the unwantedsignal samples to be contiguous in time; they can occupy severalunadjoining regions. This is advantageous because successiveinterpolations of simple regions, which may be required to treatunwanted signal samples which are, for example, in the same oroverlapping frequency bands and separated only by short time intervals,may give sub-optimal results due to dependencies built up between theinterpolations. A single application of this embodiment of the inventionmay advantageously avoid this build up of dependencies by interpolatingall the regions simultaneously.

In a preferred embodiment of the interpolation method of the invention,time and frequency constraints are selected which define a region of theaudio recording containing the unwanted sound or noise (in which theunwanted signal is superimposed on the portion of the desired audiorecording within the selected region) and which exclude the surroundingportion of the desired audio recording (the good signal). A mathematicalmodel is then derived which describes the good data surrounding theunwanted signal. A second mathematical model is derived which describesthe unwanted signal. This second model is constrained to have zerovalues outside the selected temporal region (outside the selected timeconstraints) Each of the models incorporates an independent excitationsignal. The observed signal can be treated as the sum of the good signalplus the unwanted signal, with the good signal and the unwanted signalhaving unknown values in the selected temporal region. This can beexpressed as a set of equations that can be solved analytically to findan interpolated estimate of the unknown good signal (within the selectedregion) that minimises the sum of the powers of the excitation signals.

In this embodiment of the invention, the relationship between the twomodels determines how much interpolation is applied at each frequency.By giving the model for the unwanted signal a spectrally-bandedstructure that follows the one or more selected frequency bands, thisembodiment constrains the interpolation to affect the bands withoutadversely affecting the surrounding audio (subject to frequencyresolution limits). A user parameter varies the relative intensities ofthe models in the bands, thus controlling how much interpolation isperformed within the bands.

The preferred mathematical model to use in this embodiment is anautoregressive or “AR” model. However, an “AR” model plus “basis vector”model may also be used for either model (for either signal). Thesemodels are described in the book ‘Digital Audio Restoration’, therelevant pages of which are included below.

Because of the nature of the analytical solutions referred to above, theembodiment in the preceding paragraphs will not interpolate the noisecontent of the or each selected band or sub-band. The minimisedexcitation signals do not necessarily form ‘typical’ sequences for themodels, and this can alter the perceived effect of each interpolation.This deficiency is most noticeable in noisy regions because theuncorrelated nature of noise means that the minimised excitation signalhas too little power to be ‘typical’. The result of this may be anaudible hole in the interpolated signal. This occurs wherever theinterpolated signal spectrogram decays to zero due to inadequateexcitation.

The conventional method to correct this problem proceeds on theassumption that the excitation signals driving the models areindependent Gaussian white noise signals of a known power. The methodtherefore adds a correcting signal to the excitation signal in order toensure that it is ‘white’ and of the correct power. Inherentinaccuracies in the models mean that, in practice, the excitationsignals are seldom white. This method may therefore be inadequate inmany cases.

A preferred implementation provided in a further aspect of the inventionextends the equations for the interpolator to incorporate a templatesignal for the interpolated region. The solution for these extendedequations converges on the template signal (as described below) in thefrequency bands where the solution would otherwise have decayed to zero.A user parameter may advantageously be used to scale the temporalsignal, adjusting the amount of the template signal that appears in theinterpolated solution.

In this implementation, the template signal is calculated to be noisewith the same spectral power as the surrounding good signal but withrandom phase. Analysis shows that this is equivalent to adding anon-white correcting factor to generate a more ‘typical’ excitationsignal.

This eliminates a flaw in existing methods which manifests itself as aloss of energy in the interpolation such that the signal power spectrumdecays inappropriately in parts of the interpolated region.

A different implementation could use an arbitrary template signal, inwhich case the interpolation would in effect replace the frequency bandsin the original signal with their equivalent portions from the templatesignal.

A further, less preferred, embodiment of the invention applies a filterto split the signal into two separate signals: one approximating thesignal inside a frequency band or bands (containing the unwanted sounds)and one approximating the signal outside the band or bands. Time andfrequency constraints may be selected on a spectrogram in order tospecify the portion(s) of the signal containing the unwanted sound, asdescribed above. A conventional unconstrained (in frequency)interpolation can then be performed on the signal containing theunwanted sound(s) (the sub-band frequencies) Subsequently, the twosignals can be combined to create a resulting signal that has had theinterpolations confined to the band containing the unwanted sound.Ideally, the band-split filter may be of the ‘linear phase’ variety,which ensures that the two signals can be summed coherently to createthe interpolated signal. This method has one significant drawback inthat the action of filtering spreads the unwanted sound in time. Thetime constraints of the interpolator must therefore widen to account forthis spread, thereby affecting more of the audio than would otherwise benecessary. The preferred embodiment of the invention, as describedpreviously, includes the frequency constraints as a fundamental part ofthe interpolation algorithm and therefore avoids this problem.

DESCRIPTION OF A SPECIFIC EMBODIMENT OF THE INVENTION

Specific embodiments of the invention will now be described by way ofexample with reference to the accompanying drawings, in which;

FIG. 1 shows a spectrogram of an audio signal, plotted in terms offrequency vs. time and showing the full frequency range of the recordedaudio signal;

FIG. 2 is an enlarged view of FIG. 1, showing frequencies up to 8000 Hz;

FIG. 3 shows the spectrogram of FIG. 2 with an area selected forunwanted sound removal;

FIG. 4 shows the spectrogram of FIG. 3 after unwanted sound removal;

FIG. 5 shows the spectrogram of FIG. 4 after removal of the markingsshowing the selected area;

FIGS. 6 to 13 show spectrograms illustrating a second example ofunwanted sound removal;

FIG. 14 illustrates a computer system for recording audio;

FIG. 15 illustrates the estimation of spectrogram powers using DiscreteFast Fourier transforms;

FIG. 16 is a flow diagram of an embodiment of the invention;

FIG. 17 illustrates an autoregressive model;

FIG. 18 illustrates the combination of models embodying the invention inan interpolator; and

FIGS. 19 to 23 are reproductions of FIGS. 5.2 to 5.6 respectively of thebook “Digital Audio Restoration” referred to herein.

Example 1 (referring to FIGS. 1, 2, 3, 4 and 5) shows an embodiment ofthe invention applied to an unwanted noise, probably a chair beingmoved, recorded during the decay of a piano note in a ‘live’performance. The majority of the unwanted sound is contained in oneband, or sub-band, of the spectrum, and it lasts for a duration ofapproximately 25,000 samples (approximately one half of a second). Asingle application of the invention removes the unwanted noise withoutany audible degradation of the wanted piano sound or to the ambientnoise.

FIG. 1 shows a sample of the full frequency spectrum of the audiorecording and FIG. 2 shows an enlarged portion, below about 8000 Hz. Thestart of the piano note 2 can be seen and, as it decays, only certainharmonics 4 of the note are sustained. The unwanted noise 6 overlies thedecaying harmonics.

FIG. 3 shows the selection of an area of the spectrogram containing theunwanted sound, the area being defined in terms of selected time andfrequency constraints 8, 10.

FIG. 3 also shows, as dotted lines, portions of the recorded signalwithin the selected frequency band but extended in time on either sideof the selected area containing the unwanted sound. These areas,extending to selected time limits 12, are used to represent the goodsignal on which subsequent interpolation is based. FIG. 4 shows thespectrogram of FIG. 3 after interpolation to remove the unwanted sound,as described below. FIG. 5 shows the spectrogram after removal of therectangles illustrating the time and frequency constraints.

Example 2 (FIGS. 6 to 13) shows an embodiment of the invention appliedto the sound of a car horn that sounded and was recorded during thesound of a choir inhaling. The car horn sound is observed as comprisingseveral distinct harmonics, the longest of which has a duration of about40,000 samples (a little under one second). The sound of the indrawnbreath has a strong noise-like characteristic and can be observed on thespectrogram as a general lifting of the noise floor. To eliminate thesound of the horn, each harmonic is marked as a separate sub-band andthen replaced with audio that matches the surrounding breathy sound.Once all the harmonics have been marked and replaced, the resultingaudio signal contains no audible residue from the car horn, and there isno audible degradation to the breath sound.

FIGS. 6 to 13 illustrate the removal of the unwanted car-horn sound in aseries of steps, each using the same principles as the methodillustrated in FIGS. 1 to 5. However, the car-horn comprises a number ofdistinct harmonics at different frequencies, each harmonic beingsustained for a different period of time. Each harmonic is thereforeremoved individually.

FIG. 14 illustrates a computer system capable of recording audio, whichcan be used to capture the samples of the desired digital audio signalinto a suitable format computer file. The computer system is implementedon a host computer 20 and comprises an audio input/output card 22 whichreceives audio data from a source 24. The audio input is passed via aprocessor 26 to a hard disc storage system 28. The recorded audio canthen be output from the storage system via the processor and the audiooutput card to an output 30, as required.

The computer system will then display a time/frequency spectrogram ofthe audio (as in FIGS. 1 to 13). The time frequency spectrogram displaystwo dimensional colour images where the horizontal axis of thespectrogram represents time, the vertical axis represents frequency andthe colour of each pixel in an image represents the calculated spectralpower at the relevant time and frequency. The spectrogram powers can beestimated using successive overlapped windowed Discrete Fast Fouriertransforms 40, see FIG. 15. The length of the Discrete Fast FourierTransform determines the frequency resolution 42 in the vertical axis,and the amount of overlap determines the time resolution 44 in thehorizontal axis. The colourisation of the spectrogram powers can beperformed by mapping the powers onto a colour lookup table. For examplethe spectrogram powers can be mapped onto colours of variable hue butconstant brightness and saturation. The operator can then graphicallyselect the unwanted signal or part thereof by selecting a region on thespectrogram display.

The following embodiment can either reduce the signal in the selectedregion or replace it with a signal template synthesised from thesurrounding audio. The embodiment has two parameters that determine howmuch synthesis and reduction are applied.

This method for replacing the signal proceeds as follows:

1. Derive an AR model for the good signal outside the constrainedregion, using the following steps:

-   -   Calculate the coefficients of the AR model for the known good        signal.    -   Calculate the matrix representation of the AR model and        partition it into parts corresponding to the unknown and known        parts of the signal.

2. Postulate a signal that is constrained to lie in the selectedfrequency bands and derive an AR model for the unwanted signal from it,using the following steps:

-   -   Create a power spectrum that has the value 1.0 in regions where        the signal is inside the frequency bands and 0.0 where it lies        outside.    -   Calculate the autocorrelation of the unwanted signal from this        power spectrum.    -   Calculate the AR model for the unwanted signal, using the        autocorrelation derived previously.    -   Calculate the matrix representation of the AR model and        partition it into parts corresponding to the unknown and known        parts of the signal.

3. Calculate a template signal that has a power spectrum that matchesthe good signal, but that has a randomised phase. Scale this syntheticsignal depending on how much synthesis the user has requested. From thesynthesised signal and the matrix representation of the good AR model,calculate the synthetic excitation.

4. Estimate the unwanted signal outside the time constraints. In thisimplementation that estimate is zero.

5. Use the combined equations to calculate an estimate for the unknowndata. This estimate will fulfil the requirement that the interpolationis constrained to affect only those frequencies within the selectedbands but not affect those outside the selected bands.

The implementation will then redisplay the spectrogram so that theoperator can see the effect of the interpolation (FIG. 5).

See below for a more detailed description of the equation used in eachstage and diagrams of how they interact.

MORE DETAILED DESCRIPTION OF AN EXAMPLE IMPLEMENTATION OF THE INVENTION

The flow diagram in FIG. 16 Shows the basic steps used in theinterpolation of the set of signal samples. Bach of these stages willnow be described in more detail.

Model Assumptions

Sample Sets

The operator has selected T contiguous samples 60 from a discrete timesignal that have been stored in an array of values y(t), 0≦t<N. Fromthis region the operator has selected a subset of these samples to beinterpolated. We define the set T_(u) as the subset of N_(u) sampletimes selected by the operator for interpolation We define the set T_(k)as the subset of N_(k) sample times (within T but outside the subsetT_(u)) not selected by the operator. The lengths of the two sets arerelated such that N=N_(k)+N_(u). It is also desirable that there are atleast twice as many samples in the set T_(k) as there are in T_(u).Furthermore the operator has selected one or more frequency bands withinwhich to apply the interpolation

Observation Model

The signal y(t) is assumed to be formed from the sum of two independentsignals, the good signal x(t) and an unwanted signal w(t). Therefore wehave the following model for the observationsy(t)=x(t)+w(t)   (1)or, in vector notationy=x+w   (2)wherey=[y(0) . . . y(T−1)]^(T)   (3)x=[x(0) . . . x(T−1)]^(T)   (4)w=[w(0) . . . w(T−1)]^(T)   (5)

We can further partition these vectors into those elements correspondingto the set of sample times T_(u) and those corresponding to the set ofsample times T_(k).y _(u) =x _(u) +w _(u)   (6)y _(k) =x _(k) +w _(k)   (7)wherey _(u) =[y(t ₀) . . . y(t _(Nu−1))]^(T) ,t _(j) ∈T _(u)   (8)x _(u) =[x(t ₀) . . . x(t _(Nu−1))]^(T) ,t _(j) ∈T _(u)   (9)w _(u) =[w(t ₀) . . . w(t _(Nu−1))]^(T) ,t _(j) ∈T _(u)   (10)y _(k) =[y(t ₀) . . . y(t _(Nk−1))]^(T) ,t _(j) ∈T _(k)   (11)x _(k) =[x(t ₀) . . . x(t _(Nk−1))]^(T) ,t _(j) ∈T _(k)   (12)w _(k) =[w(t ₀) . . . w(t _(Nk−1))]^(T) ,t _(j) ∈T _(k)   (13)

Obviously both y _(u) and y _(k) are known as they form the observedsignal values.

We stipulate that the values of x(t) and w(t) must be known a priori forthe set of sample times T_(k). Hence, in the case where the unwantedsignal is zero in this region we getw _(k) =0  (14)x _(k) =y _(k)   (15)

We define our interpolation method as estimating the unknown values of x_(u)

Deriving the AR Model for the Good Signal

The basic form of an AR model is shown in FIG. 17. Mathematically thisis expressed for the good signal as $\begin{matrix}\begin{matrix}{{{x(t)} = {{e_{x}(t)} - {\sum\limits_{i = 1}^{P_{a}}{a_{i}{x\left( {t - i} \right)}}}}},} & {P_{a} \leq t < N}\end{matrix} & (16)\end{matrix}$or in its alternate form $\begin{matrix}\begin{matrix}{{{e_{x}(t)} = {\sum\limits_{i = 0}^{P_{a}}{a_{i}{x\left( {t - i} \right)}}}},} & \quad & {a_{0} = 1}\end{matrix} & (17)\end{matrix}$where

-   -   P_(a) is the order of the autoregressive model, typically of the        order 25.

The autoregressive model is specified by the coefficients a₁ e_(x)(t)defines an excitation sequence hat drives the model.

In this case we have to estimate the coefficients of the model only fromthe known values of x _(k). It is sufficient for this purpose to createa new vector x₁(t) that assumes the unknown values of x(t) are zero.$\begin{matrix}{{x_{1}(t)} = \left\{ \begin{matrix}{0,{t \in T_{u}}} \\{{x(t)},{t \in T_{k}}}\end{matrix} \right.} & (18)\end{matrix}$Solving for the AR Coefficients

There are several methods for calculating the model coefficients. Thisexample uses the covariance method as follows:

Equation 16 can now be reformulated into a matrix form as$\begin{matrix}\begin{matrix}{\begin{bmatrix}{e_{x}\left( {N - 1} \right)} \\\vdots \\{e_{x}\left( P_{a} \right)}\end{bmatrix} = {\begin{bmatrix}{x_{1}\left( {N - 1} \right)} \\\vdots \\{x_{1}\left( P_{a} \right)}\end{bmatrix} +}} \\{\begin{bmatrix}{x_{1}\left( {N - 1} \right)} & \cdots & {x_{1}\left( {N - 1 - P_{a}} \right)} \\\vdots & \quad & \vdots \\{x_{1}\left( {P_{a} - 1} \right)} & \cdots & {x_{1}(0)}\end{bmatrix} \cdot \begin{bmatrix}a_{i} \\\vdots \\a_{p}\end{bmatrix}}\end{matrix} & (19)\end{matrix}$which can be expressed more compactly in the following equation anappropriate definition e _(x) , x ₁, a and X₁e _(x) =x ₁ +X ₁ ·a   (20)

The values of a that minimise the excitation energyJ _(x)=e _(x) ^(T) e _(x)   (21)can be calculated jointly using the formulaa =−(X ₁ ^(T) X ₁)⁻¹ X ₁ ^(T) x ₁   (22)

This minimum value of J_(x) should also be calculated (J_(xmin)) usingequations 20 and 21

Expressing the Model in Terms of the Known and Unknown Signal

Having calculated the model coefficients a, we can use equation 17 toexpress an alternative matrix representation of the model.$\begin{matrix}\begin{matrix}{\begin{bmatrix}{e_{x}\left( {N - 1} \right)} \\\vdots \\{e_{x}\left( P_{a} \right)}\end{bmatrix} = {\begin{bmatrix}1 & a_{1} & \cdots & a_{P_{a}} & 0 & \quad & \quad & 0 & \quad \\\quad & \quad & 0 & \quad & ⋰ & \quad & \quad & 0 & \quad \\\quad & \quad & 0 & \quad & 0 & 1 & a_{1} & \cdots & a_{P_{a}}\end{bmatrix} \cdot}} \\{\begin{bmatrix}{x\left( {N - 1} \right)} \\\vdots \\{x(0)}\end{bmatrix}}\end{matrix} & (23)\end{matrix}$which can be expressed more compactly with an appropriate definition ofA ase _(x) =A·x.this matrix equation can be partitioned into two parts ase _(x) =A _(u) ·x _(u) +A _(k) x _(k)   (24)where the matrix A_(u) is submatrix of A formed by taking the columns ofA appropriate to the unknown data x _(u) and the matrix A_(k) issubmatrix of A formed by taking the columns of A appropriate to theknown data x _(k).Deriving the AR Model for the Unwanted Signal

The model for the unwanted signal uses an AR model as in the Good signalmodel. Mathematically this is expressed as $\begin{matrix}\begin{matrix}{{{w(t)} = {{e_{w}(t)} - {\sum\limits_{i = 0}^{P_{b}}{b_{i}{w\left( {t - i} \right)}}}}},} & {P_{b} \leq t < N}\end{matrix} & (25)\end{matrix}$or in its alternate form $\begin{matrix}\begin{matrix}{{{e_{w}(t)} = {\sum\limits_{i = 0}^{P_{b}}{b_{i}{w\left( {t - i} \right)}}}},} & {b_{0} = 1}\end{matrix} & (26)\end{matrix}$where

-   -   P_(b) is the order of the autoregressive model with sufficiently        high order to create a model constrained to lie in the selected        frequency bands. For very narrow bands this is relatively        trivial, but it will require a typically require a model order        of several hundred for broader selected bands.

The autoregressive model is specified by the coefficients b_(i) e_(w)(t)defines an excitation sequence that drives the model

Solving for the AR Coefficients

The difficulty is in finding a model that adequately expresses thefrequency constraints. One method is to create a hypothetical artificialwaveform with the required band limited structure and then solve themodel equations for this artificial waveform Let this artificialwaveform be w′(t). We can get a solution for the model coefficientspurely by knowing the correlation function of this waveform:r _(ww)(τ)=E{w′(t)w′(t−τ)}  (27)

Create an artificial power spectrum W′({overscore (w)}) which has anamplitude of 1.0 inside the frequency bands and zero outside it. Talkingthe inverse Discrete Fourier Transform of this power spectrum will givea suitable estimate for r_(ww)(τ)

The filter coefficients can be found by the following equation$\begin{matrix}{\begin{bmatrix}b_{1} \\\vdots \\b_{P_{b}}\end{bmatrix} = {{- \begin{bmatrix}{r_{ww}(0)} & \quad & {r_{ww}\left( {1 - P_{b}} \right)} \\\quad & ⋰ & \quad \\{r_{ww}\left( {P_{b} - 1} \right)} & \quad & {r_{ww}(0)}\end{bmatrix}} \cdot \begin{bmatrix}{r_{ww}(1)} \\\vdots \\{r_{ww}\left( P_{b} \right)}\end{bmatrix}}} & (28)\end{matrix}$

Furthermore the excitation power required for this artificial model canbe calculated as: $\begin{matrix}{J_{w\quad\min} = {{r_{ww}(0)} - {\begin{bmatrix}b_{1} \\\vdots \\b_{P_{b}}\end{bmatrix}^{T}\begin{bmatrix}{r_{ww}(1)} \\\vdots \\{r_{ww}\left( P_{b} \right)}\end{bmatrix}}}} & (29)\end{matrix}$Expressing the Model in Terms of the Known and Unknown Signal

Having calculated the model coefficients b, we can use equation 26 toexpress an alternative matrix representation of the model.$\begin{matrix}\begin{matrix}{\begin{bmatrix}{e_{w}\left( {N - 1} \right)} \\\vdots \\{e_{w}\left( P_{b} \right)}\end{bmatrix} = {\begin{bmatrix}1 & b_{1} & \cdots & b_{P_{b}} & 0 & \quad & \quad & 0 & \quad \\\quad & \quad & 0 & \quad & ⋰ & \quad & \quad & 0 & \quad \\\quad & \quad & 0 & \quad & 0 & 1 & b_{1} & \cdots & b_{P_{b} - 1}\end{bmatrix} \cdot}} \\{\begin{bmatrix}{w\left( {N - 1} \right)} \\\vdots \\{w(0)}\end{bmatrix}}\end{matrix} & (30)\end{matrix}$which can be expressed more compactly with al appropriate definition ofB ase _(w) =B·wthis matrix equation can be partitioned into two parts ase _(w) =B _(u) ·w _(u) +B _(k) w _(k)   (31)with suitable definitions of B_(k) and B_(u)

We now use equation 1 to express equation 33 in terms of y and xe _(w) =B _(u)·( y _(u) −x _(u))+B _(k)·( y _(k) −x _(k))   (32)

In the case where w _(k)=0 this collapses toe _(w) =B _(u)·( y _(u) −x _(u))   (33)The Template signal

We calculate the template signal s from the known good data x _(k) asfollows. We calculate the Discrete Fourier Transform X₁({overscore (w)})of the waveform x₁(t) defined in equation 18. We then create a syntheticspectrum S₁({overscore (w)}) that has the same amplitude asX₁({overscore (w)}) but uses pseudo-random phases. This spectrum is theninverted using the inverse Discrete Fourier Transform to give thetemplate signal s. This has to be subsequently filtered by the goodsignal model to give a template excitation e _(s) as follows:e _(s)=As

We hypothesise a new signalΔ x=x−Δs,   (34)where λ is a user defined parameter that scales the template signal inorder to increase or decrease its effect. This difference signal canitself be modelled by the good signal model.Δ e=e _(x) −λe _(s) =AΔx   (35)

This can be expanded intoΔ e=A _(u) ·x _(u) +A _(k) x _(k) −λAs   (36)The Interpolation Model

The diagram in FIG. 18 illustrates how all these models are broughttogether to create the interpolator. It now remains for us to create acost function that brigs all these aspects together, and then minimisingwith respect to the unknown samples x _(u). The cost function we use is$\begin{matrix}{J = {\frac{{\mu \cdot J_{x\quad\min}}{\underset{\_}{e}}_{w}^{T}{\underset{\_}{e}}_{w}}{J_{w\quad\min}} + {\Delta\quad{\underset{\_}{e}}^{T}\Delta\quad\underset{\_}{e}}}} & (37)\end{matrix}$where μ is a user defined parameter that controls how much interpolationis performed in the frequency bands. This equation can be modified bysubstituting $\begin{matrix}{\mu^{\prime} = \frac{\mu \cdot J_{x\quad\min}}{J_{w\quad\min}}} \\{J = {{\mu^{\prime}{\underset{\_}{e}}_{w}^{T}{\underset{\_}{e}}_{w}} + {\Delta\quad{\underset{\_}{e}}^{T}\Delta\quad{\underset{\_}{e}.}}}}\end{matrix}$

Minimising this equation with respect to x _(u) leads to the followingestimate {circumflex over (x)} _(u) for x _(u):{circumflex over (x)} _(u)=(A _(u) ^(T) A _(u) +μ′.B _(u) ^(T) B_(u))⁻¹(μ′.B _(u) ^(T) B _(u) y _(u) −A _(u) ^(T) A _(k) x _(k) +λA _(u)^(T) e _(s))Background Reference

The following pages show copies of pages 86 to 89, 111, and 114 to 116of the book ‘Digital Audio Restoration” referenced above.

86 4. Parameter Estimation, Model Selection and Classification

The transfer function for this model is ${H(z)} = \frac{B(z)}{A(z)}$where B(z)=Σ_(j=0) ^(Q)b_(j)z^(−j) and A(z)=1−Σ_(i=1) ^(P)a_(i)z^(−i).

The model can be seen to consist of applying an IIR filter (see 2.5 1)to the ‘excitation’ or ‘innovation’ sequence {e_(n)}, which is i.i.d.noise Generalisations to the model could include the addition ofadditional deterministic input signals (the ARMAX model [114, 21]) orthe inclusion of linear basis functions in the same way as for thegeneral linear model:y=x+Gθ

An important special case of the ARMA model is the autoregressive (AR)or ‘all-pole’ (since the transfer function has poles only) model inwhich B(z)=1. This model is used considerably throughout the text and isconsidered in the next section.

4.3 Autoregressive (AR) Modelling

A time series model which is fundamental to much of the work in thisbook is the autoregressive (AR) model, in which the data is modelled asthe output of an all-pole filter excited by white noise. This modelformulation is a special case of the innovations representation for astationary random signal in which the signal {X_(n)} is modelled as theoutput of a linear time invariant filter driven by white noise. In theAR case the filtering operation is restricted to a weighted sum of pastoutput values and a white noise innovations input {e_(n)}:$\begin{matrix}{x_{n} = {{\sum\limits_{i = 1}^{P}\quad{a_{i}x_{n - i}}} + {e_{n}.}}} & (4.41)\end{matrix}$

The coefficients {a_(i); i=1 . . . P} are the filter coefficients of theall-pole filter, henceforth referred to as the AR parameters, and P, thenumber of coefficients, is the order of the AR process. The AR modelformulation is closely related to the linear prediction framework usedin many fields of signal processing (see e.g. [174, 119]). AR modellinghas some very useful properties as will be seen later and these willoften lead to simple analytical results where a more general model suchas the ARMA model (see previous section) does not. In addition, the ARmodel has a reasonable basis as a source-filter model for the physicalsound production process in many speech and audio signals [156, 187].

4.3 Autoregressive (AR) Modelling 87

4.3.1 Statistical Modelling and Estimation of AR Models

If the probability distribution function p_(e) (e_(n)) for theinnovation process is known, it is possible to incorporate the ARprocess into a statistical framework for classification and estimationproblems. A straightforward change of variable I_(n) to e_(n) gives usthe distribution for I_(n) conditional on the previous P data values as$\begin{matrix}{{p\left( {\left. x_{n} \middle| x_{n - 1} \right.,x_{n - 2},\ldots\quad,x_{n - P}} \right)} = {p_{e}\left( {x_{n} - {\sum\limits_{i = 1}^{P}\quad{a_{i}x_{n - i}}}} \right)}} & (4.42)\end{matrix}$

Since the excitation sequence is i.i.d. we can write the jointprobability for a contiguous block of N−P data samples I_(P+1) . . .I_(N) conditional upon the first P samples I₁ . . . I_(P) as$\begin{matrix}{\begin{matrix}{p\left( {x_{P + 1},x_{P + 2},\ldots\quad,} \right.} \\\left. {\left. x_{N} \middle| x_{1} \right.,x_{2},{\ldots\quad x_{P}}} \right)\end{matrix} = {\prod\limits_{n = {P + 1}}^{N}\quad{p_{e}\left( {x_{n} - {\sum\limits_{i = 1}^{P}\quad{a_{i}x_{n - i}}}} \right)}}} & (4.43)\end{matrix}$

This is now expressed in matrix-vector notation. The data samples I₁, .. . , I_(N) and parameters a₁, a₂, . . . , a_(P−1), a_(P) are written ascolumn vectors of length N and P, respectivelyx=[I ₁ I ₂ . . . I _(N)]^(T) , a=[a ₁ a ₂ . . . a _(P−1) a _(P)]^(T)  (4.44)x is partitioned into x₀, which contains the first P samples I₁, . . . ,I_(P), and x₁ which contains the remaining (N−P) samples I_(P+1) . . .I_(N):x ₀ =[I ₁ I ₂ . . . I _(P)]^(T) , x ₁ =[I _(P+1) . . . I _(N)]^(T)  (4.45)

The AR modelling equation of (4.41) is now rewritten for the block of Ndata samples asx ₁ =G a+e   (4.46)where e is the vector of (N−P) excitation values and the ((N−P)×P)matrix G is given by $\begin{matrix}{G = \begin{bmatrix}x_{P} & x_{P - 1} & \cdots & x_{2} & x_{1} \\x_{P + 1} & x_{P} & \cdots & x_{3} & x_{2} \\\vdots & \quad & ⋰ & \quad & \vdots \\x_{N - 2} & x_{N - 3} & \cdots & x_{N - P} & x_{N - P - 1} \\x_{N - 1} & x_{N - 2} & \cdots & x_{N - P + 1} & x_{N - P}\end{bmatrix}} & (4.47)\end{matrix}$

The conditional probability expression (4.43) now becomesp(x ₁ |x ₀ ,a)=p _(e)(x ₁ −Ga)   (4.48)88 4. Parameter Estimation, Model Selection and Classificationand in the case of a zero-mean Gaussian excitation we obtain$\begin{matrix}{\begin{matrix}{p\left( \left. x_{1} \right| \right.} \\\left. {x_{0},a} \right)\end{matrix} = {\frac{1}{\left( {2\pi\quad\sigma_{e}^{2}} \right)^{\frac{N - P}{2}}}{\exp\left( {{- \frac{1}{2\sigma_{e}^{2}}}{\left( {x_{1} - {Ga}} \right)^{T} \cdot \left( {x_{1} - {Ga}} \right)}} \right)}}} & (4.49)\end{matrix}$

Note that this introduces a variance parameter σ_(e) ² which is ingeneral unknown. The p.d.f. given is thus implicitly conditional onσ_(e) ² as well as a and x₀.

The form of the modelling equation of (4.46) looks identical to that ofthe general linear parametric model used to illustrate previous sections(4.1). We have to bear in mind, however, that G here depends upon thedata values themselves, which is reflected in the conditioning of thedistribution of x₁ upon x₀. It can be argued that this conditioningbecomes an insignificant ‘end-effect’ for N>>P [155] and we can thenmake an approximation to obtain the likelihood for x:p(x|a)≈p(x ₁ |a,x ₀), N>>P   (4.50)

How much greater than P N must be will in fact depend upon the polepositions of the AR process. Using this result an approximate MLestimator for a can be obtained by maximisation w.r.t. a, from which weobtain the well-known covariance estimate for the AR parameters,a ^(Cov)=(G ^(T) G)⁻¹ G ^(T) x ₁   (4.51)which is equivalent to a minimisation of the sum-squared predictionerror over the block, E=Σ_(i=P+1) ^(N)e_(i) ², and has the same form asthe ML parameter estimate in the general linear model.

Consider now an alternative form for the vector model equation (4.46)which will be used in subsequent work for Bayesian detection of clicksand interpolation of AR data:e=Ax   (4.52)where A is the ((N−P)×(N)) matrix defined as $\begin{matrix}{A = \begin{bmatrix}{- a_{P}} & \cdots & {- a_{1}} & 1 & 0 & 0 & \cdots & 0 & 0 \\0 & {- a_{P}} & \cdots & {- a_{1}} & 1 & 0 & 0 & \cdots & 0 \\\vdots & \vdots & ⋰ & ⋰ & ⋰ & ⋰ & ⋰ & \vdots & \vdots \\\cdots & 0 & 0 & {- a_{P}} & \cdots & {- a_{1}} & 1 & 0 & 0 \\0 & \cdots & 0 & 0 & {- a_{P}} & \cdots & {- a_{1}} & 1 & 0 \\0 & 0 & \cdots & 0 & 0 & {- a_{P}} & \cdots & {- a_{1}} & 1\end{bmatrix}} & (4.53)\end{matrix}$

The conditional likelihood for white Gaussian excitation is thenrewritten as: $\begin{matrix}{{p\left( {\left. x_{1} \middle| x_{0} \right.,a} \right)} = {\frac{1}{\left( {2\pi\quad\sigma_{e}^{2}} \right)^{\frac{N - P}{2}}}{\exp\left( {{- \frac{1}{2\sigma_{e}^{2}}}x^{T}A^{T}A\quad x} \right)}}} & (4.54)\end{matrix}$4.3 Autoregressive (AR) Modelling 89

In order to obtain the exact (i.e. not conditional upon x₀) likelihoodwe need the distribution p(x₀|a), sincep(x|a)=p(x ₁ |x ₀ ,a)p(x ₀ |a)

In appendix C this additional term is derived, and the exact likelihoodfor all elements of x is shown to require only a simple modification tothe conditional likelihood, giving: $\begin{matrix}{{p\left( x \middle| a \right)} = {\frac{1}{\left( {2\quad\pi\quad\sigma_{e}^{2}} \right)^{\frac{N}{2}}{M_{x_{0}}}^{1/2}}{\exp\left( {{- \frac{1}{2\quad\sigma_{e}^{2}}}x^{T}\quad M_{x}^{- 1}x} \right)}}} & (4.55) \\{where} & \quad \\{M_{x}^{- 1} = {{A^{T}A} + \begin{bmatrix}M_{x_{0}}^{- 1} & 0 \\0 & 0\end{bmatrix}}} & (4.56)\end{matrix}$and M_(x) ₀ is the autocovariance matrix for P samples of data drawnfrom AR process a with unit variance excitation. Note that this resultrelies on the assumption of a stable AR process. As seen in theappendix, M_(x) ₀ ⁻¹ is straightforwardly obtained in terms of the ARcoefficients for any given stable AR model a. In problems where the ARparameters are known beforehand but certain data elements are unknown ormissing, as in click removal or interpolation problems, it is thussimple to incorporate the true likelihood function in calculations. Inpractice it will often not be necessary to use the exact likelihoodsince it will be reasonable to fix at least P ‘known’ data samples atthe start of any data block. In this case the the conditionallikelihood. (4.54) is the required quantity. Where P samples cannot befixed it will be necessary to use the exact likelihood expression (4.55)as the conditional likelihood will perform, badly in estimating missingdata points within x₀.

While the exact likelihood is quite easy to incorporate in missing dataor interpolation problems with known a, it is much more difficult to usefor AR parameter estimation since the functions to maximise arenon-linear in the parameters a. Hence the linearising approximation ofequation (4.50) will usually be adopted for the likelihood when theparameters are unknown.

In this section we have shown how to calculate exact and approximatelikelihoods for AR data, in two different forms: one as a quadratic formin the data x and another as a quadratic (or approximately quadratic)form in the parameters a. This likelihood will appear on many subsequentoccasions throughout the book.

5.2 Interpolation of Missing Samples 111

5.2.3.1 Pitch-Based Extension to the AR Interpolator

Vaseghi and Rayner [191] propose an extended AR model to take account ofsignals with long-term correlation structure, such as voiced speech,singing or near-periodic music. The model, which is similar to the longterm pre diction schemes used in some speech coders, introduces extrapredictor parameters around the pitch period T, so that the AR modelequation is modified to: $\begin{matrix}{{x_{t} = {{\sum\limits_{i = 1}^{P}{x_{n - i}a_{i}}} + {\sum\limits_{j = {- Q}}^{Q}{x_{n - T - j}b_{j}}} + e_{i}}},} & (5.16)\end{matrix}$where Q is typically smaller than P. Least squares/ML interpolationusing this model is of a similar form to the standard LSAR interpolator,and parameter estimation is straightforwardly derived as an extension ofstandard AR parameter estimation methods (see section 4.3.1). The methodgives a useful extra degree of support from adjacent pitch periods whichcan only be obtained using very high model orders in the standard ARcase. As a result, the ‘under-prediction’ sometimes observed wheninterpolating long gaps is improved. Of course, an estimate of T isrequired, but results are quite robust to errors in this. Veldhuis [192,chapter 4] presents a special case of this interpolation method in whichthe signal is modelled by one single ‘prediction’ element at the pitchperiod (i.e. Q=0 and P=0 in the above equation).5.2.3.2 Interpolation with an AR+Basis Function Representation

A simple extension of the AR-based interpolator modifies the signalmodel to include some deterministic basis functions, such as sinusoidsor wavelets. Often it will be possible to model most of the signalenergy using the deterministic basis, while the AR model captures thecorrelation structure of the residual. The sinusoid+residual model, forexample, has been applied successfully by various researchers, see e.g.[169, 158, 165, 66]. The model for I_(n) with AR residual can be writtenas: $\begin{matrix}{x_{n} = {{\sum\limits_{i = 1}^{Q}{c_{i}{\psi_{i}\lbrack n\rbrack}}} + r_{n}}} & {where} & {r_{n} = {{\sum\limits_{i = 1}^{P}{a_{i}r_{n - i}}} + e_{n}}}\end{matrix}$Here φ_(i)[n] is the nth element of the ith basis vector φ_(i) and r_(n)is the residual, which is modelled as an AR process in the usual way.For example, with a sinusoidal basis we might takeφ_(2i−1)[n]=cos(w_(i)nT) and φ_(2i)[n]=sin(w_(i)nT), where w_(i) is theith sinusoid frequency. Another simple example of basis functions wouldbe a d.c. offset or polynomial trend. These can be incorporated withinexactly the same model and hence the interpolator presented here is ameans for dealing also with non-zero mean or smooth underlying trends.114 5. Removal of Clicks

If we assume for the moment that the set of basis vectors {φ_(i)} isfixed and known for a particular data vector x then the LSARinterpolator can easily be extended to cover this case. The unknowns arenow augmented by the basis coefficients, {c_(i)}. Define c as a columnvector containing the c_(i)'s and a (N×Q) matrix G such that x=Gc+r,where r is the vector of residual samples. The columns of G are thebasis vectors, i.e. G=[φ₁ . . . φ_(Q)]. The excitation sequence can thenbe written in terms of x and c as e=A(x−Gc), which is the same form asfor the general linear model (see section 4.1).As before the solutioncan easily be obtained from least squares, ML and MAP criteria, and thesolutions will be equivalent in most cases. We consider here the leastsquares solution which minimises e^(T)e as before, but this time withrespect to both x_((i)) and c, leading to the following estimate:$\begin{matrix}{\begin{bmatrix}x_{(1)} \\c\end{bmatrix} = {\begin{bmatrix}{A_{(i)}^{T}A_{(i)}} & {{- A_{(i)}^{T}}A\quad G} \\{{- G^{T}}A^{T}A_{(i)}} & {G^{T}A^{T}A\quad G}\end{bmatrix}^{- 1}\quad\begin{bmatrix}{{- A_{(i)}^{T}}A_{- {(i)}}x_{- {(i)}}} \\{G^{T}A^{T}A_{- {(i)}}x_{- {(i)}}}\end{bmatrix}}} & (5.17)\end{matrix}$

This extended version of the interpolator reduces to the standardinterpolator when the number of basis vectors, Q, is equal to zero. Ifwe back-substitute for c in (5.17), the following expression is obtainedfor x_((i))

5.2 Interpolation of Missing Samples 115

alone:x _((i))=−(A _((i)) ^(T)(I−AG(G ^(T) A ^(T) AG)⁻¹ G ^(T) A ^(T))A_((i)))⁻¹(A _((i)) ^(T) A _(−(i)) x _(−(i)) −A _((i)) AG(G ^(T) A ^(T) AG)⁻¹ G^(T) A ^(T) A _(−(i)) x _(−(i)))

These two representations are equivalent to both the maximum likelihood(ML) and maximum a posteriori (MAP)¹ interpolator under the sameconditions as the standard AR interpolator, i.e. that no missing samplesoccur in the first P samples of the data vector. In cases where missingdata does occur in the first P samples, a similar adaptation to thealgorithm can be made as for the pure AR case. The modified interpolatorinvolves some extra computation in estimating the basis coefficients,but as for the pure AR case many of the terms can be efficientlycalculated by utilising the banded structure of the matrix A.¹ assuming a uniform prior distribution for the basis coefficients

We do not address the issue of basis function selection here. Multiscaleand ‘elementary waveform’ representations such as wavelet bases maycapture the non-stationary nature of audio signals, while a sinusoidalbasis is likely to capture the character of voiced speech and thesteady-state section of musical notes. Some combination of the two maywell provide a good match to general audio. Procedures have been devisedfor selection of the number and frequency of sinusoidal basis vectors inthe speech and audio literature [127, 45, 66] which involve various peaktracking and selection strategies in the discrete Fourier domain. Moresophisticated and certainly more computationally intensive methods mightadopt a time domain model selection strategy for selection ofappropriate basis functions from some large ‘pool’ of candidates. ABayesian approach would be a strong possibility for this task, employingsome of the powerful Monte Carlo variable selection methods which arenow available [65, 108]. Similar issues of iterative AR parameterestimation apply as for the standard AR interpolator in the AR plusbasis function interpolation scheme.

5.2.3.2.1 Example: Sinusoid+AR Residual Interpolation

As a simple example of how the inclusion of deterministic basis vectorscan help in restoration performance we consider the interpolation of ashort section of brass music, which has a strongly ‘voiced’ character,see FIG. 5.2. FIG. 5.3 shows the same data with three missing sections,each of length 100 samples. This was used as the initialisation for theinterpolation algorithm. Firstly a sinusoid+AR interpolation wasapplied, using 25 sinusoidal basis frequencies and an AR residual withorder P=15. The algorithm used was iterative, re-estimating the ARparameters, sinusoidal frequencies and missing data points at each step.The sinusoidal frequencies

116 5. Removal of Clicks

are estimated rather crudely at each step by simply selecting the 25frequencies in the DFT of the interpolated data which have largestmagnitude. The number of iterations was 5. FIG. 5.4 shows the resultinginterpolated data, which can be seen to be a very effectivereconstruction of the original uncorrupted data. Compare this withinterpolation using an AR model of order 40 (chosen to match the. 25+15parameters of the sin+AR interpolation), as shown in FIG. 5.5, in whichthe data is under-predicted quite severely over the missing sections.Finally, a zoomed-in comparison of the two methods over a short sectionof the same data is given in FIG. 5.6, showing more clearly the way inwhich the AR interpolator under-performs compared with the sin+ARinterpolator.

5.2.3.3 Random Sampling Methods

A further modification to the LSAR method is concerned with thecharacteristics of the excitation signal. We notice that the LSARprocedure seeks to minimise the excitation energy of the signal,irrespective of its time domain autocorrelation. This is quite correct,and desirable mathematical properties result. However, FIG. 5.8 showsthat the resulting excitation signal corresponding to the corruptedregion can be correlated and well below the level of surroundingexcitation. As a result, the ‘most probable’ interpolants mayunder-predict the true signal levels and be over-smooth compared withthe surrounding signal. In other words, ML/MAP procedures do notnecessarily generate interpolants which are typical for the underlyingmodel, which is an important factor in the perceived effect of therestoration. Rayner and Godsill [161] have devised a method whichaddresses this problem. Instead of minimising the excitation energy, weconsider interpolants with constant excitation energy. The excitationenergy may be expressed as:E=(x _((i)) −x _((i)) ^(LS))^(T) A _((i)) ^(T) A _((i))(x _((i)) −x_((i)) ^(LS))+E _(LS) , E>E _(LS),   (5.18)where E_(LS) is the excitation energy corresponding to the LSAR estimatex_((i)) ^(LS). The positive definite matrix A_((i)) ^(T)A_((i)) can befactorised into ‘square roots’ by Cholesky or any other suitable matrixdecomposition [86] to give A_((i)) ^(T)A_((i))=M^(T)M, where M is anon-singular square matrix. A transformation of variablesu=M(x_((i))−x_((i)) ^(LS)) then serves to de-correlate the missing datasamples, simplifying equation (5.18) to:E=u ^(T) u+E _(LS),   (5.19)from which it can be seen that the (non-unique) solutions with constantexcitation energy correspond to vectors u with constant L₂-norm. Theresulting interpolant can be obtained by the inverse transformationx_((i))=M⁻¹u+x_((i)) ^(LS).

1-28. (canceled)
 29. The method according to claim 25, wherein the setof samples within the selected portion or portions of the signal isassumed to be corrupted by a disturbance which is modeled by a model, inwhich the model is used to constrain the interpolation of the set ofsamples such that the interpolated signal affects the signal spectruminside the selected constraint(s).
 30. The method according to claim 29,in which the model is an autoregressive (AR) model.
 31. The methodaccording to claim 30, wherein the signal is modelled by an AR modelcharacterised by the interaction of the signal model and the disturbancemodel being used to constrain the interpolation of the set of samplessuch that the interpolated signal affects the signal spectrum inside theselected constraint(s).
 32. The method according to claim 25, in whichan AR model is used to interpolate an estimate for a set of samples in asignal from the surrounding samples, characterised by applying amodification to excitation signal equations that makes the interpolatedsignal converge to a chosen template signal.
 33. The method according toclaim 25, comprising the step of selecting one or more constrainedregions, each being defined in terms of an upper and a lower time boundand an upper and a lower frequency bound, the bounds being independentlyadjustable so as to define a portion or region of the audio signal thatcontains at least part of the unwanted sound.
 34. The method accordingto claim 25, comprising the step of replacing an audio signal within theor each constrained portion or region such that the unwanted sound isattenuated or removed while the audio signal outside the constrainedportion or region is not adversely affected.
 35. The method according toclaim 25, characterized by the use of band pass filtering to split thesignal into two or more signals representing the signal inside thefrequency constraint(s) or band(s) and the signal outside the frequencyband(s), interpolating one or more of these band passed signals, andrecombining the band passed signals so as to form an interpolatedestimate of the original spectrum inside the selected band(s).
 36. Themethod according to claim 35, characterized by the interpolated signalbeing constrained to converge to the chosen template signal within theselected constraint(s).
 37. The method according to claim 36,characterised by the template signal having the same power spectrum asthe surrounding signal, thereby preventing inappropriate power loss inthe interpolated data.
 38. The method according to claim 25,characterised by the simultaneous interpolation of two or more discretefrequency constraints or bands, and/or two or more discrete regionsbounded by time constraints.
 39. The method according to claim 35,characterized by the use of a spectrogram to define the discretefrequency constraints and to define the set of samples to beinterpolated and to define the set of samples that are used to derivethe signal model.
 40. The method according to claim 25, in which theinterpolated signal affects the signal spectrum inside the selectedconstraint(s) without adversely affecting the signal spectrum that liesoutside the selected constraints(s).
 41. The method according to claim25, in which the interpolated signal affects the signal spectrum thatlies outside the selected constraint(s) without adversely affecting thesignal spectrum inside the selected constraint(s).
 42. The methodaccording to claim 25, comprising the step of simultaneouslyinterpolating two or more discrete frequency bands.
 43. The methodaccording to claim 25, in which a spectrogram is used to define thediscrete frequency constraint or constraints and to define the set ofsamples to be interpolated and to define the set of samples that areused to derive the signal model.
 44. A method for the attention orremoval of an unwanted sound from an audio signal in which an AR modelis used to interpolate an estimate for a set of samples in the signalfrom the surrounding samples, in which a modification is applied to theexcitation signal equations that makes the interpolated signal convergeto a chosen template signal.
 45. The method according to claim 44, inwhich the template signal has the same power spectrum as the surroundingsignal.
 46. The method according to claim 44, in which the interpolatedsignal is constrained to converge to the chosen template signal withinthe selected constraints or band(s).
 47. The method according to claim46, in which the template signal has the same power spectrum as thesurrounding signal, thereby preventing inappropriate power loss in theinterpolated data.
 48. The method according to claim 44, comprising thestep of simultaneously interpolating two or more discrete frequencybands.
 49. The method according to claim 44, in which a spectrogram isused to define the discrete frequency constraint or constraints and todefine the set of samples to be interpolated and to define the set ofsamples that are used to derive the signal model.
 50. The methodaccording to claim 44, characterized by the interpolated signal beingconstrained to converge to the chosen template signal within theselected constraint(s).
 51. The method according to claim 50,characterized by the template signal having the same power spectrum asthe surrounding signal, thereby preventing inappropriate power loss inthe interpolated data.
 52. The method according to claim 44,characterized by the simultaneous interpolation of two or more discretefrequency constraints or bands, and/or two or more discrete regionsbounded by time constraints.
 53. The method according to claim 50,characterized by the use of a spectrogram to define the discretefrequency constraints and to define the set of samples to beinterpolated and to define the set of samples that are used to derivethe signal model.
 54. The method according to claim 44, in which theinterpolated signal affects the signal spectrum inside the selectedconstraint(s) without adversely affecting the signal spectrum that liesoutside the selected constraints(s).
 55. The method according to claim44, in which the interpolated signal affects the signal spectrum thatlies outside the selected constraint(s) without adversely affecting thesignal spectrum inside the selected constraint(s).
 56. A method in whichmathematical techniques are used to interpolate an estimate for a set ofsamples in a digital audio signal from surrounding samples,characterised by the selection of one or more frequency constraints orbands, these being used to constrain the interpolation of the set ofsamples such that the interpolated signal affects the signal spectruminside selected constraint(s) or band(s) without adversely affecting thesignal spectrum that lies outside the selected constraint(s) or band(s).57. A method in which mathematical techniques are used to interpolate anestimate for a set of samples in a digital audio signal from surroundingsamples, characterised by the selection of one or more frequencyconstraints or bands, these being used to constrain the interpolation ofthe set of samples such that the interpolated signal affects the signalspectrum that lies outside the selected constraint(s) or band(s) withoutadversely affecting the signal spectrum inside selected constraint(s) orband(s).
 58. A method in which an AR model is used to interpolate anestimate for a set of samples in a digital audio signal from surroundingsamples, characterised by applying a modification to the excitationsignal equations that makes the interpolated signal converge to a chosentemplate signal.